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How to compress a redundant file? 


000000000000000001001001000000000000000000100001000101000000 
000101000000000000000000000100000000010000000000000100000110 
000010000000000000001000000000001000010000010000000000000000 
000010000001000000000001000000000000000000000000010000000000 
000000010001000000100000010010000000000001000000001000000000 
000100100001010100000011000010000000000001000000000000000000 
000000000000000000000000010100000010000000000000100010100001 
100000000010000000000000000000000000000000000000000000000000 
010000000000000100000000000000000100001000000000110110000000 
101000101000000000000000000000000000100010100000100001000000 


ең. N = 1000 tosses of a bent coin with p, =0.1 


How to measure information content? 


Claims: 1. The Shannon information content of an outcome 
h(z=a;) = 08 55 
(sca) = юе EIS 

is a sensible measure of information content. 


2. The entropy 
1 
H(X) = P(z) logo 5735 
со = D Pi) loe роду 


is a sensible measure of expected information 
content. 


3. Source coding theorem — 
N outcomes from a source X can be compressed 
into roughly N H(X) bits. 


Source coding theorem — 


N outcomes from a source X can be compressed 
into roughly N НІХ) bits. 


Proved by counting the typical set 


When a source X 
produces N independent outcomes 


X= 7124 x 
this string is very likely to be one of the 
~ 2NH(X) typical outcomes 
all of which have probability ~ 2 WI) 


Example: the bent coin 


000000000000000001001001000000000000000000100001000101000000 
000101000000000000000000000100000000010000000000000100000110 
000010000000000000001000000000001000010000010000000000000000 
000010000001000000000001000000000000000000000000010000000000 
000000010001000000100000010010000000000001000000001000000000 
000100100001010100000011000010000000000001000000000000000000 
000000000000000000000000010100000010000000000000100010100001 
100000000010000000000000000000000000000000000000000000000000 
010000000000000100000000000000000100001000000000110110000000 
101000101000000000000000000000000000100010100000100001000000 


How we won the bent coin lottery Probably of =! 


To have a 99.99% chance of winning, 
we bought all the typical tickets 


Number of 
tickets in 
‘typical cet 
УР ың,4) 


іті-2 


Example: the bent coin 


000000000000000001001001000000000000000000100001000101000000 
000101000000000000000000000100000000010000000000000100000110 
000010000000000000001000000000001000010000010000000000000000 
000010000001000000000001000000000000000000000000010000000000 
000000010001000000100000010010000000000001000000001000000000 
000100100001010100000011000010000000000001000000000000000000 
000000000000000000000000010100000010000000000000100010100001 
100000000010000000000000000000000000000000000000000000000000 
010000000000000100000000000000000100001000000000110110000000 
101000101000000000000000000000000000100010100000100001000000 


Gaussian distribution 


P(x > 10 
P(x >20 
Р(г > 2.30) 


How to compress a redundant file, 


practically? 


000000000000000001001001000000000000000000100001000101000000 
000101000000000000000000000100000000010000000000000100000110 
000010000000000000001000000000001000010000010000000000000000 
000010000001000000000001000000000000000000000000010000000000 
000000010001000000100000010010000000000001000000001000000000 
000100100001010100000011000010000000000001000000000000000000 
000000000000000000000000010100000010000000000000100010100001 
100000000010000000000000000000000000000000000000000000000000 
010000000000000100000000000000000100001000000000110110000000 
101000101000000000000000000000000000100010100000100001000000 


Emma Woodhouse, handsome, clever, and rich, with a 
comfortable home and happy disposition, seemed to unite some 
of the best blessings of existence; and had lived nearly 
twenty one years in the world with very little to distress 
or vex her. She was the youngest of the two daughters of a 
most affectionate, indulgent father; and had, in consequence 
of her sister's marriage, been mistre: of his house from a 
very early period. Her mother had died too long ago for her 
to have more than an indistinct remembrance of her caresses; 
and her place had been supplied by an excellent woman as 
governess, who had fallen little short of a mother in 
affection: Sixteen years had Miss Taylor been in Mr 
Woodhouse's family, less as a governess than a friend, very 
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The symbol-code supermarket 
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SUMMARY 4 


€ — 2 


Gibbs inequality 


У ров? > 0 
Ф 


1 
[equality only if q = p] 


Symbol code summary 


unique decodeability 
— requires 3,27" <1 Kraft inequality 


[if equality holds, the code is complete] 
‘prefix codes 
+ and binary trees 
^ whenever a code achieved L-H, it had 


codelengths 17 equal to the information contents 
Ip = log 
Pi 
S optimal symbol codes 
‘The optimal symbol code's expected length L satisfies 
H(X) < L< Н(Х)+1 


Huffman algorithm 


Results stated without proof (see Chapter 5) 


^ Kraft inequality 
Yeu 
‘Gibbs inequality 
Урок Bg 
г Ф 
49 Right hand side of the symbol. code source coding theorem 


— 
H(X) < ЦС,Х) < Н(Х) +1 


‘@Hutfman coding is optimal 


Project: 
Invent a compressor and uncompressor for a 
source file of N = 10,000 bits, each having probability 
f =0.01 of being a 1. 
Implement them and/or 
estimate how well your method works. 


Other recommended exercises: 5.22, 5.26, 5.27 
(Reading: Chapters 1, 2, 4, 5. 
Advance reading: Chapters 8, 9, 10 
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